Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

نویسندگان

چکیده

Averaging scheme has attracted extensive attention in deep learning as well traditional machine learning. It achieves theoretically optimal convergence and also improves the empirical model performance. However, there is still a lack of sufficient analysis for strongly convex optimization. Typically, about last iterate gradient descent methods, which referred to individual convergence, fails attain its optimality due existence logarithmic factor. In order remove this factor, we first develop averaging (GDA), general projection-based dual algorithm setting. We further present primal-dual cases (SC-PDA), where primal schemes are simultaneously utilized. prove that GDA yields rate terms output averaging, while SC-PDA derives convergence. Several experiments on SVMs models validate correctness theoretical effectiveness algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?

Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standa...

متن کامل

Adding vs. Averaging in Distributed Primal-Dual Optimization

Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization. Our framework...

متن کامل

Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization

We propose a new randomized coordinate descent method for a convex optimization template with broad applications. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As a result, our method features the first convergence rate guarantees among the coordinate descent ...

متن کامل

Efficient Stochastic Gradient Descent for Strongly Convex Optimization

We motivate this study from a recent work on a stochastic gradient descent (SGD) method with only one projection (Mahdavi et al., 2012), which aims at alleviating the computational bottleneck of the standard SGD method in performing the projection at each iteration, and enjoys an O(log T/T ) convergence rate for strongly convex optimization. In this paper, we make further contributions along th...

متن کامل

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T )/T ), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This mig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i11.17183